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Temporal networks come with a wide variety of heterogeneities, from burstiness of event sequences 
to correlations between timings of node and link activations. In this paper, we set to explore the 
latter by using greedy walks as probes of temporal network structure. Given a temporal network 
(a sequence of contacts), greedy walks proceed from node to node by always following the hrst 
available contact. Because of this, their structure is particularly sensitive to temporal-topological 
patterns involving repeated contacts between sets of nodes. This becomes evident in their small 
coverage per step as compared to a temporal reference model - in empirical temporal networks, 
greedy walks often get stuck within small sets of nodes because of correlated contact patterns. 

While this may also happen in static networks that have pronounced community structure, the use 
of the temporal reference model takes the underlying static network structure out of the equation 
and indicates that there is a purely temporal reason for the observations. Further analysis of the 
structure of greedy walks indicates that burst trains, sequences of repeated contacts between node 
pairs, are the dominant factor. However, there are larger patterns too, as shown with non-back- 
tracking greedy walks. We proceed further to study the entropy rates of greedy walks, and show 
that the sequences of visited nodes are more structured and predictable in original data as compared 
to temporally uncorrelated references. Taken together, these results indicate a richness of correlated 
temporal-topological patterns in temporal networks. 


I. INTRODUCTION 

When it comes to complex networks, temporal net¬ 
works truly deserve to be called complex, because of their 
wide range of heterogeneities [T] . While they inherit com¬ 
mon structural heterogeneities of static networks such 
as clustering and communities, they also exhibit purely 
temporal heterogeneities, e.g. burstiness of contact se¬ 
quences [ng. There are also structures that could be 
categorised as temporal-topological, such as temporal 
subgraphs and motifs [3 ig that consist of rapid se¬ 
quences of contacts within small sets of nodes. Temporal 
motifs can be viewed as a subset of an even larger class of 
higher-order temporal structures, where the contacts of a 
sequence are both temporally and structurally correlated 
and of a non-Markovian nature future contacts de¬ 
pend on when and where past contacts happened. This 
class also includes triggered events where events in the 
neighbourhood of a node are seen to frequently follow 
one another within a short period of time (see, e.g., nni), 
the phenomenon of betweenness preference m, where 
events typically follow certain local pathways, and the 
frequently occurring burst trains of events (“ping-pong 
patterns”) between pairs of nodes in communication net¬ 
works [mils]. 

In this paper, we set out to investigate temporal-topol¬ 
ogical structures spanned by consecutive events between 
nodes. We introduce the concept of temporal greedy 
walks, walks that are purely determined by the sequence 
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of events in the temporal network that acts as a sub¬ 
strate. We then use such walks as probes of temporal 
network structure. Temporal greedy walks have no coun¬ 
terpart in static networks. A greedy walker on a temporal 
network always follows the first event out of its current 
node. Therefore, the path it takes is particularly sensi¬ 
tive to temporal correlations involving adjacent links, in 
particular the above-mentioned non-Markovian contact 
sequences containing repeated contacts with small groups 
of nodes, from burst trains to temporal motifs. Because 
such temporal-topological patterns then trap the walk¬ 
ers within these node groups, analysis of the structure of 
the paths taken by greedy walkers should reveal traces 
of these patterns (see Fig.[^a). In particular, comparing 
the properties of temporal greedy walks to their proper¬ 
ties on reference networks, where such patterns have been 
removed with the help of time-stamp shuffling, allows es¬ 
timating how dominant these patterns are in the tempo¬ 
ral network structure. One advantage of this approach 
compared to e.g. the temporal motifs approach [Hi is 
that one does not need to specify patterns of interest 
beforehand; any sequence of contacts that immediately 
follow one another counts. 

Greedy walks are a limiting case of random walks [ni- 
nzi. For pre-determined temporal network structure, 
such as empirical contact lists with time stamps, tempo¬ 
ral greedy walks are entirely deterministic once the ini¬ 
tial conditions (first node, time) have been set, as long as 
nodes only participate in a single event at a time, which 
is mainly the case with our empirical data. Note that 
in some temporal-network models of random walks [H- 
[2n| . the walks themselves are in fact temporally greedy, 
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FIG. 1. The anatomy of a typical greedy walk, as illustrated 
with the first steps of a greedy walk in one of our data sets, 
a) The trajectory of the walk as a function of time. Each 
horizontal line indicates the timeline of a node, and the walk 
follows the pink path. The trapping effect typical for our data 
sets is clearly visible, as the walker repeatedly visits a small 
number of nodes, b) Coverage (the number of distinct nodes 
visited) as function of steps taken for the same walk. 


and randomness only comes from a stochastic model of 
the underlying temporal network. Further, in studies 
of random walks on temporal networks the focus has 
mainly been on issues such as effects of burstiness on 
mean first passage and relaxation times [nin], mod¬ 
els that generate temporal networks |21j . and identifi¬ 
cation of timescales [HI [H] ■ We believe that our work 
is the first to apply temporal greedy walks to analysis 
of empirical data on temporal networks. We take “real 
time” out of the equation (cf. Ref. [24] ) and focus on 
the structure of greedy walks, i.e. their order of visited 
nodes, step by step. We employ the commonly-used time- 
shuffled reference model in the same spirit, as a reference 
model that yields walks with temporally random event 
sequences that are not affected by timing correlations. 

In this paper, we first investigate the coverage of greedy 
walks in empirical temporal networks (see Figurej^b). As 
the coverage of a walk measures the number of unique 
nodes visited, measuring its growth as a function of the 
number of steps taken is a good way of revealing the ex¬ 
istence of “traps”, where greedy walkers remain within a 
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E-mail 1 [H] 

56,576 

431,138 

112 
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1 s 

E-mail 2 [H] 

3,186 

308,726 

82 

d 

1 s 

FB* |2Zj 

31,359 

566,305 

15,000 

h 

1 s 

Forum* [28] 

6,625 

1,359,075 

2,400 

d 

1 s 

Hospital |29| 

75 

32,424 

4 

d 

20 s 

Messages* [2S| 

22,695 

280,717 

3 

d 

1 s 

Dating* [30] 

17,009 

185,578 

250 

d 

1 s 

Reality [111I31| 

64 

13,131 

8.6 

h 

5 m 


TABLE I. Properties of the temporal network data sets: N 
- the number of links, E - the number of events, T - time 
interval covered by the data, and At - time resolution of the 
data. Transient periods are removed from networks marked 
with an asterisk. 


small set of nodes for prolonged times, because of burst 
trains, temporal motifs, and the like. The results of 
this analysis point out that there is an abundance of 
burst trains between pairs of nodes that dominate greedy 
walks. This is confirmed by the very high fraction of 
backtracking steps in the walks as compared to the refer¬ 
ence model. Because of this, we next turn to non-back- 
tracking walks, where the greedy walkers are not allowed 
to directly trace their last step back, and show that there 
are correlated temporal patterns beyond the burst trains. 
Finally, we apply an information-theoretic measure to the 
greedy walks, and show that for both ordinary and non¬ 
backtracking walks, the entropy rates of the walks are 
typically lower than in the reference networks. 


II. DATA SETS AND SIMULATIONS 

We study simulated greedy walks on eight different 
temporal network data sets that contain time-stamped 
events between nodes. Six of the data sets are electronic 
records of e-mail communication {E-mail 1, E-mail 2) 
to Internet communities (FR, Eorum, Messages, Dating) 
and two represent physical proximity {Reality, Hospital). 
For details, see Table |Tj 

We have performed exhaustive simulations of greedy 
walks beginning at every node at time t = 0, contin¬ 
ued until the end of data, for the eight different empir¬ 
ical temporal networks detailed above. In each run, the 
greedy walker always follows the first available event out 
of its current node; if there are multiple simultaneous 
events (this happens in the physical proximity data sets), 
the walker randomly picks one (and therefore becomes a 
random walker for this particular step). The whole node 
sequence that the greedy walker follows is then recorded. 
Computationally, this is done by keeping track of the 
current location of the walker while looping through the 
time-ordered set of contacts of the temporal network, and 
moving the walker to the next available node once a cor¬ 
responding contact (or set of contacts) is encountered. 
This makes computing one greedy walk scale as 0{E), 
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FIG. 2. The average number of unique nodes covered as a function of the number of steps taken for greedy walks in the 
eight empirical temporal networks. Red lines denote average coverage in original data, whereas blue lines are for time-shuffled 
reference networks. Shaded areas indicate standard deviations. Note that event timestamps have only been used to determine 
the order of events that the greedy walks follow, and here we do not consider the times taken between consecutive steps. 
Therefore, the explanation for the smaller coverage in original data is that the sequences contain correlated temporal patterns 
and chains of repeating events within small sets of nodes. 


where E is the number of contact events. Note that our 
networks are small enough to allow exhaustive simula¬ 
tions even on a desktop computer; for larger data sets, 
randomly sampling starting nodes should be sufficient. 
In any case, the problem is embarrassingly parallel, and 
exhaustive computation for greedy walks on larger net¬ 
works should be possible in a reasonable amount of time 
with a computing cluster. 

For reference, similar simulations have been performed 
using time-shuffled reference networks where the time 
stamps of all events are randomly exchanged. This pro¬ 
cedure retains the original number of events between all 
pairs of nodes but removes all temporal correlations be¬ 
tween events on adjacent links. In the exhaustive simula¬ 
tions that cover all nodes as starting points, the networks 
have been re-shuffled for every 500 nodes in order to save 
computation time; further reshuffling would not qualita¬ 
tively change the results. For the smallest Hospital and 
Reality networks, time-shuffling was repeated for each 
greedy walk. 

III. RESULTS 
A. Coverage and burst trains 

We begin our analysis by investigating the coverage of 
greedy walks as a function of the number of steps, in all 
empirical networks. For Fig. we have first counted the 


number of unique visited nodes as a function of the num¬ 
ber of steps taken for each greedy walk, and then com¬ 
puted the average and standard deviation of this quantity 
for all numbers of steps. As the lengths of walks mea¬ 
sured in steps show large variation, for some excessively 
long walks, we perform the measurement only up to the 
number of steps taken by at least 50 different walks in 
order to avoid a lack of statistics. Figure [^displays this 
node coverage as a function of steps taken for all data 
sets (red lines), together results for the time-shuffled ref¬ 
erence model (blue lines). 

From Fig. it is clear that greedy walks on top of em¬ 
pirical event sequences, on average, cover fewer nodes per 
step than in the uncorrelated reference models [52]. This 
means that the same nodes are visited more often in the 
empirical data compared to the time-shuffled reference 
sequences. Note that this difference between walks on 
the original networks and reference sequences only comes 
from temporal aspects like the non-Markovian nature of 
the original sequences - because the underlying static 
network is the same for both original and reference se¬ 
quences, topological features that may trap walkers (e.g. 
communities) are equally present in both sequences. Fur¬ 
ther, the observation cannot be directly attributed to the 
presence of burstiness: because we count steps, not how 
long it takes to take them, the observed slow growth of 
coverage does not result from the slowing-down effects 
of burstiness that arise from long waiting times (as in 
e.g. m- However, burst trains between node pairs and 
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Dataset 

FIG. 3. The fraction of events through which at least one 
greedy walk passes, for each data set. 


“ping-pong patterns” [l2] , that is, burstiness that resides 
on links, do play a role in shaping the paths of greedy 
walks, as we will see below. 

Because of the effects of trapping and broad event fre¬ 
quency distributions on links, not all events are touched 
by greedy walks. For our exhaustive simulations, the 
proportion of events that have carried greedy walkers 
from one node to the next range from 45% in Email-1 to 
98% in Messages, see Figure Moreover, counting the 
number of greedy walks passing through each event, one 
sees that the numbers of walkers per event are clearly 
broadly distributed (Fig. |^. This means that only a 
few walks pass through most events, and thus not all 
walks get trapped to the same paths. However, there 
are some (rare) central events, and consequently central 
paths, through which many greedy walks pass. Note that 
after passing throngh the same event, all greedy walks fol¬ 
low the same path to either the end of the walk, or until 
there are multiple simultaneous exits from a node. 

The low node coverage of greedy walks discussed above 
can in principle result from any temporal-topological cor¬ 
relations that limit the number of nodes visited by walk¬ 
ers, from burst trains on links to larger patterns of re¬ 
peated consequent contacts between nodes that cause 
the walks to fold back on already visited nodes. In or¬ 
der to quantify the role of the first (burst trains), we 
compute the total fraction of backtracking steps, where 
the walker directly returns to the node from which it 
arrived (e.g. ABA), for all data sets. These fractions 
are shown in Fig. for both the original data and 
the time-shuffled reference sequences. Fractions of back¬ 
tracking steps range from 29% to 67%, while they are 
much lower in the reference sequences [33]. Therefore, it 
is evident that the back-and-forth ping-pong patterns of 
burst trains that trap walkers play a major role in the 
low coverage of greedy walks. Furthermore, because of 
their surprisingly high abundance, with a high likelihood, 
burst trains can be expected to play an important role in 



Walks through event 


FIG. 4. The probability density function for the number of 
greedy walks that pass through each event, for each data set. 


other types of dynamical processes that unfold on tem¬ 
poral networks as well. 


B. Non-backtracking greedy walks 

Because burst trains in the shape of repeated contacts 
between node pairs are clearly a dominant factor in deter¬ 
mining the coverage of greedy walks, we next deliberately 
disallow such patterns in order to understand the impor¬ 
tance of larger temporal-topological structures. To this 
end, we simulate greedy walks that have one additional 
rule - walkers always follow the next event out of a node 
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FIG. 5. a) The fraction of backtracking steps where the 
greedy walker immediately returns to the node it arrived 
from, in original and time-shuffled data sets, b) The frac¬ 
tion of triangle-closing steps (greedy walker returns to the 
node where it was two steps ago) in non-backtracking greedy 
walks, in original and time-shuffled data sets. 


that does not lead hack to the previous node. This means 
that these non-backtracking greedy walks are not allowed 
to follow burst trains between two nodes; however, they 
may become trapped by any larger-scale patterns, from 
triangles (ABCA) to larger temporal motifs. 

Figure displays the average coverage of non-back- 
tracking walkers as a function of the number of steps 
taken, for the original data and the reference sequences. 
Here, it is seen that for most data sets, there is still a 
difference, and the coverage grows more slowly in the 
original data (however, the difference is clearly smaller 
than for ordinary greedy walks). Thus, there are larger- 
scale topological-temporal structures (such as temporal 
triangles) in the original data that trap greedy walkers, 
albeit less frequently than the burst trains for ordinary 
walks. For Reality, the difference is to the opposite di¬ 


rection: coverage in the original data grows faster than 
in the reference model. This is related to peculiarities of 
this small network. A visual analysis shows that it con¬ 
tains dense subnetworks with frequent events and lots of 
triangles; these dense subnetworks are active at different 
times, forcing non-backtracking greedy walkers to jump 
from one subnetwork to the next. In the time-shuffled ref¬ 
erence networks, all subnetworks are active at all times 
and their abundance of triangles guarantees that walk¬ 
ers can repeatedly visit the same nodes (see also Fig. ^ 
the fraction of triangle-closing steps is very high in the 
shuffled reference for Reality). 

Similarly to the fraction of backtracking steps in ordi¬ 
nary greedy walks, we have computed the total fraction 
of triangle-closing steps in non-backtracking greedy walks 
for all data sets. These are steps that lead the walker to 
the node where it was two steps ago, e.g. the final step 
in ABCA is a triangle-closing step. The fraction of such 
steps is displayed in Fig. [^) for all data sets. It is seen 
that although the fraction of triangle-closing steps is in 
general lower than that of backtracking steps, there is 
nevertheless a consistent difference between greedy walk 
structures in the original and shuffled data sets. This is 
indicative of the existence of larger temporal-topological 
structures from triangles to other motifs. 


C. Entropy rates of greedy walks 

We conclude by investigating the structure of greedy 
walks in more detail, and focus on quantifying the 
amount of repeated sequences in greedy walks. To this 
end, we apply information-theoretic measures along the 
lines of Refs. [331ISD- Specifically, we estimate the en¬ 
tropy rates S of all greedy walks for both original and 
time-shuffled data following the approach of Song et 
al. [33]. The entropy rate of a sequence of symbols is 
defined as 


1 ” 

S = lim - VS'(I), (1) 

n—>-oo Ti 

where S{i) is the conditional entropy of the i’th step. For 
finite strings, one can estimate the entropy rate using the 
Lempel-Ziv estimator 



( 2 ) 


where £ is the length of the sequence, and A^ is the length 
of the shortest subsequence of visited nodes starting at 
step i that does not appear previously in the sequence. 
This estimator represents the asymptotic lower bound on 
the per-symbol description length when a realization of 
a stationary ergodic process is losslessly compressed [36] . 
The estimator converges to S when n —>■ oo if the source 
of the sequence is a stationary Markov chain of finite 
order [57] IMj; note that for non-Markovian sequences 
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FIG. 6. The number of unique nodes covered as a function of the number of steps taken for non-backtracking greedy walks. 
Red lines denote coverage in original data, whereas blue lines are for tiine-shufHed reference networks. 


such as studied here convergence is not necessarily guar¬ 
anteed. Because the formula assumes the sequence to 
be one-sided infinite (in order to compute the length of 
the shortest novel subsequence at step I), we have taken 
greedy walks of L > 20 steps and computed the Lempel- 
Ziv estimator for ^ = L/2, i.e. the first half of the walk. 

The PDF’s for the entropy rates of greedy walks are 
displayed in Fig.jTj Clearly, on average the entropy rates 
for greedy walks that follow the original event sequences 
are lower than for time-shuffled data, indicating more 
structured walks with repeated and more predictable 
node sequences. In fact, this is a direct consequence of 
the behavior of the coverage as a function of steps taken 
(Fig. [2]) - slower-growing coverage implies lower entropy 
rate of the sequence. The same applies to the fraction of 
backtracking steps (Fig.[^ - frequent backtracking steps 
imply high predictability and low entropy. 

We have repeated the same analysis for non-backtrack¬ 
ing walks (Fig. 8), with a result that is in line with the 
coverages (Fig. 61 - entropy rates of most original data 
sets are still clearly below their time-shuffled counter¬ 
parts (with the exception of Reality and a vanishingly 
small difference for E-mail 1). However, the differences 
are less pronounced than for ordinary greedy walks. 


IV. CONCLUSIONS 


Studying greedy walks is a way to understand tem¬ 
poral networks, complementing studies of e.g. time- 
respecting paths and communicability [33] j spreading 


phenomena [3110], and temporal motifs laiH]. In this 
work, we have used two types of greedy walks, with 
and without backtracking, to probe the structure of 
time and topology in empirical temporal networks. We 
have shown that for all our data sets, greedy walks get 
trapped in non-Markovian temporal-topological struc¬ 
tures. The clearest example is ping-pong patterns, or 
burst trains [3 iia> of steps back and forth between 
two nodes. Studying the coverage statistics of non-back- 
tracking walks can indicate the existence of more complex 
temporal-topological patterns. Also in this case, for most 
data sets, there are clear differences between greedy walks 
on real data and random null models. For example, in our 
data sets Forum and Hospital, there is a very strong sup¬ 
pression of the coverage for the non-backtracking walks. 
This is also reflected in an over-representation of triangle¬ 
closing steps in these data sets. In Forum, there are trian¬ 
gles arising from discussions within groups of three per¬ 
sons (members of an Internet community); in the Hospi¬ 
tal data, triangles can come from two health care workers 
(a physician and a nurse, or two nurses) visiting a patient. 
By measuring the entropy rate, we put these observations 
on an information-theoretic basis. The conclusions from 
this, for our particular data sets, are the same as from 
the coverage statistics. 

We believe that greedy walks are useful as a tool for 
exploring and probing temporal networks. This is not 
because they mimic important processes - in practice, 
e.g. spreading processes and synchronization are prob¬ 
ably more important dynamics on temporal networks. 
Rather, by imposing constraints on walks such as done 
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here, one can explore and probe temporal-topological 
structures in a controlled way, similarly to randomizing 
temporal networks in successively more restrictive ways 
to isolate important structures [T]. 
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FIG. 7. Probability density functions for estimated entropy rates S of greedy walks for all data sets. The red lines denote 
PDFs for walks on top of original event sequences, whereas blue lines are again for time-shuffled reference data. 
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FIG. 8. Probability density functions for estimated entropy rates S of non-backtracking greedy walks for all data sets. The red 
lines denote PDFs for walks on top of original event sequences, whereas blue lines are again for time-shufHed reference data. 








































